CONTEXT

The Tanzanian tourism sector plays a significant role in the Tanzanian economy, contributing about 17% to the country’s GDP and 25% of all foreign exchange revenues. The sector, which provides direct employment for more than 600,000 people and up to 2 million people indirectly, generated approximately $2.4 billion in 2018 according to government statistics. Tanzania received a record 1.1 million international visitor arrivals in 2014, mostly from Europe, the US and Africa. Tanzania is the only country in the world which has allocated more than 25% of its total area for wildlife, national parks, and protected areas.There are 16 national parks in Tanzania, 28 game reserves, 44 game-controlled areas, two marine parks and one conservation area

OBJECTIVE

he objective of this project is to explore and build a linear regression model that will predict the spending behaivior of tourists visiting Tanzania.The model can be used by different tour operators and the Tanzania Tourism Board to automatically help tourists across the world estimate their expenditure before visiting Tanzania

IMPORTING LIBRARIES AND DATASET

DESCRIBING AND UNDERSTANDING OF DATASET

There is no duplicated data in the dataset.

There are missing values or datas in travel_with, total_female, total_male and most impressing columns of our data set

Lets grab the isnull datas for each missing data

Describe the dataset

The mean, standard deviation, minimum, maximum and percentiles of the dataset are described above.

FILLING NAN VALUES OF OUR DATASET

All NaN values are filled up

After filling NaN, there is no duplicated data in our dataset

There is a slight difference in the mean and standard deviation of the total_female, total_male datas of the data sets.

PROFILE REPORT

EXPLORATIVE DATA ANALYSIS

UNIVARIATE ANALYSIS FOR NUMERICAL DATA ON DATASET

WHAT IS THE INFO SOURCE OF TOURIST?

The highest coun of information source for tourist visiting Tanzania is travel agent and tour operator. There is a lowcount of publicity using media source.

TOTAL FEMALE

The histogram plot shows a highest count values of 0 and 1 which covers about 34.7% and 50.30% respectively with a mean of 0.927

TOTAL MALE

The histogram plot shows a highest count values of 0 and 1 which covers about 23.7% and 61.6% respectively with a mean of 1.0098

NIGHT MAINLAND

The night_nainland has a minimum and maximum value of 0 and 145 respectively. The mean night_mainland is 8.488

NIGHT ZANZIBAR

The night_zanzibar has a minimum and maximum value of 0 and 61 respectively. The mean night_mainland is 2.304 and a standard deviation of 4.277

TOTAL COST

The total cost has a minimum and maximum value of 49000 and 99532875 respectively.

What are the top 5 countries with the highest spending statistic ?

which age-group are the highest spenders and who are the over all highest spenders by travel with?

45-64 age_group are the highest spenders while 1-24 age group are the least spenders

The pie chart shows that the majority age group is 25 -44 and the least age group is 65+

Age group 25 - 44 has the highest count of tourist who travelled alone, friend/relatives, spouse and age group 65+ has a least count of tourist who travelled with wife and children

Tourist who travel with their spouse are the most spenders

Tourist who are 65+ and having spouse spend the highest cost, tourist in the age group of 25-44 who are alone spend the least

what is the most prefered payment mode by tourists?

The most preferred payment mode by tourist is cash.

Highlight the Aspects of tourism that are more profitable and in which it is worthwhile to invest in

The highest population of people who visit Tanzania go there for leisure and holidays mainly for wildlife tourism

Aspects of tourism that are more profitable and in which it is worthwhile to invest in is diving and sport fishing where tourist spend more on

what is the average number of nights a toursits spends in Tanzania mainland?

An average number of eight night was spent by tourist in Tanzania mainland

what is the average number of nights a toursits spends in Zanzibar?

An average number of 2 night was spent by tourist in Zanzibar

what is the most sort after food by tourists?

Majority count of people who visit Tanzania do not sort after package food

Tourist who travel to Tanzania for leisure and holidays both have the highest count of package food and do not arrange for package food

Tourist who are in Tazania for wildlife tourism both have a high count for package food and no to package food

HEATMAP

The heatmap shows the correlation of the variables in the dataset

pairplot to explore the relationship between the numerical features in the dataset

Data Preprocessing

CHECKING MULTICOLLINEARITY

The heatmap shows us the correlation of the numerical variable of corrmatrix

There is no feature with collinearity at the set threshold

Building a Regression Model

Evaluate Model Performances

linear regression model perform poorly and not a good model to evaluate the dataset

Testing Linear Model Assumptions

Lets clean and fill Nan

Actionable Insights and Recommendations

Linear regression model was used for this study and it gave an r2 score of 0.279. linear regression model perform poorly and not a good model to evaluate the dataset. we recommend therefore, that other models show be used to model the dataset for high performance.